22 research outputs found

    Randomization algorithms for large sparse networks

    Get PDF
    In many domains it is necessary to generate surrogate networks, e.g., for hypothesis testing of different properties of a network. Generating surrogate networks typically requires that different properties of the network are preserved, e.g., edges may not be added or deleted and edge weights may be restricted to certain intervals. In this paper we present an efficient property-preserving Markov chain Monte Carlo method termed CycleSampler for generating surrogate networks in which (1) edge weights are constrained to intervals and vertex strengths are preserved exactly, and (2) edge and vertex strengths are both constrained to intervals. These two types of constraints cover a wide variety of practical use cases. The method is applicable to both undirected and directed graphs. We empirically demonstrate the efficiency of the CycleSampler method on real-world data sets. We provide an implementation of CycleSampler in R, with parts implemented in C.Peer reviewe

    Guided Visual Exploration of Relations in Data Sets

    Get PDF
    Efficient explorative data analysis systems must take into account both what a user knows and wants to know. This paper proposes a principled framework for interactive visual exploration of relations in data, through views most informative given the user's current knowledge and objectives. The user can input pre-existing knowledge of relations in the data and also formulate specific exploration interests, which are then taken into account in the exploration. The idea is to steer the exploration process towards the interests of the user, instead of showing uninteresting or already known relations. The user's knowledge is modelled by a distribution over data sets parametrised by subsets of rows and columns of data, called tile constraints. We provide a computationally efficient implementation of this concept based on constrained randomisation. Furthermore, we describe a novel dimensionality reduction method for finding the views most informative to the user, which at the limit of no background knowledge and with generic objectives reduces to PCA. We show that the method is suitable for interactive use and is robust to noise, outperforms standard projection pursuit visualisation methods, and gives understandable and useful results in analysis of real-world data. We provide an open-source implementation of the framework.Peer reviewe

    Robust regression via error tolerance

    Get PDF
    Real-world datasets are often characterised by outliers; data items that do not follow the same structure as the rest of the data. These outliers might negatively influence modelling of the data. In data analysis it is, therefore, important to consider methods that are robust to outliers. In this paper we develop a robust regression method that finds the largest subset of data items that can be approximated using a sparse linear model to a given precision. We show that this can yield the best possible robustness to outliers. However, this problem is NP-hard and to solve it we present an efficient approximation algorithm, termed SLISE. Our method extends existing state-of-the-art robust regression methods, especially in terms of speed on high-dimensional datasets. We demonstrate our method by applying it to both synthetic and real-world regression problems.Peer reviewe

    Significance of Patterns in Data Visualisations

    Get PDF
    In this paper we consider the following important problem: when we explore data visually and observe patterns, how can we determine their statistical significance? Patterns observed in exploratory analysis are traditionally met with scepticism, since the hypotheses are formulated while viewing the data, rather than before doing so. In contrast to this belief, we show that it is, in fact, possible to evaluate the significance of patterns also during exploratory analysis, and that the knowledge of the analyst can be leveraged to improve statistical power by reducing the amount of simultaneous comparisons. We develop a principled framework for determining the statistical significance of visually observed patterns. Furthermore, we show how the significance of visual patterns observed during iterative data exploration can be determined. We perform an empirical investigation on real and synthetic tabular data and time series, using different test statistics and methods for generating surrogate data. We conclude that the proposed framework allows determining the significance of visual patterns during exploratory analysis.Peer reviewe

    Shifting of attentional set is inadequate in severe burnout : Evidence from an event-related potential study

    Get PDF
    Individuals with prolonged occupational stress often report difficulties in concentration. Work tasks often require the ability to switch back and forth between different contexts. Here, we studied the association between job burnout and task switching by recording event-related potentials (ERPs) time-locked to stimulus onset during a task with simultaneous cue-target presentation and unpredictable switches in the task. Participants were currently working people with severe, mild, or no burnout symptoms. In all groups, task performance was substantially slower immediately after task switch than during task repetition. However, the error rates were higher in the severe burnout group than in the mild burnout and control groups. Electrophysiological data revealed an increased parietal P3 response for the switch trials relative to repetition trials. Notably, the response was smaller in amplitude in the severe burnout group than in the other groups. The results suggest that severe burnout is associated with inadequate processing when rapid shifting of attention between tasks is required resulting in less accurate performance. (C) 2016 Elsevier B.V. All rights reserved.Peer reviewe
    corecore